\documentclass{article}

\usepackage[T1]{fontenc}
\usepackage[osf]{libertine}
\usepackage{microtype}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{psfrag}
\usepackage{usbib}
\usepackage{auto-pst-pdf}
\usepackage{bm}
\usepackage{url}
\usepackage{nicefrac}
\usepackage{subfig}

\usepackage[letterpaper, textwidth=15cm, textheight=20cm]{geometry}

\bibliographystyle{myusmeg-a}
\renewcommand{\citenamefont}[1]{\textsc{\MakeLowercase{#1}}}
\renewcommand{\bibnamefont}[1]{\textsc{#1}}
\renewcommand*{\bibname}{biblography}

\newcommand{\deq}{\triangleq}
\newcommand{\psff}[1]{\input{figures/#1.tex}\includegraphics{figures/#1.eps}}
\newcommand{\cm}[1]{\mathcal{#1}}
\newcommand{\data}{\cm{D}}
\newcommand{\given}{\mid}
\newcommand{\R}{\mathbb{R}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\intd}[1]{\,\mathrm{d}#1}

\DeclareMathOperator{\poisson}{Poisson}
\DeclareMathOperator*{\argmax}{arg\,max}

\title{Bayesian inference for nonstationary, nonhomogeneous Poisson
  processes with changepoints}

\author{
  Roman Garnett, Jeff Schneider\\
  {Carnegie Mellon University}\\
  {5000 Forbes Avenue}\\
  {Pittsburgh, Pennsylvania 15213}\\
  \url{rgarnett@andrew.cmu.edu}\\
  \url{schneide@cs.cmu.edu}
  \and
  Michael A Osborne\\
  {University of Oxford}\\
  {Parks Road}\\
  {Oxford \textsc{ox}1 3\textsc{pj}}\\
  {United Kingdom}\\
  \url{mosb@robots.ox.ac.uk}
  \and
  Richard P Mann\\
  {Uppsala Universitet}\\
  {Uppsala 751\,06}\\
  {Sweden}\\
  \url{rmann@math.uu.se}
}

\begin{document}

\maketitle

\begin{abstract}
  We present a method for performing inference about point processes
  that might undergo sudden changes in their dynamics.  In particular,
  we consider the log-Gaussian Cox process, which places a Gaussian
  process prior distribution on the logarithm of the intensity
  function of a nonhomogeneous Poisson process.  We model potential
  changepoints in the intensity function using nonstationary
  changepoint covariance functions and will perform fully
  probabilistic changepoint detection by approximating the posterior
  distribution over putative changepoint locations.  We will also
  approximate the full posterior distribution over the intensity
  function by marginalizing out the model hyperparameters, in
  particular the locations of potential changepoints.  Finally, we
  will analyze two real-world datasets using the proposed techniques.
\end{abstract}

\section{Introduction}

The analysis of nonstationary point processes with changepoints in
their intensity is an important practical problem and has been subject
to a great deal of research.  In nearly all---if not all---work on
this subject models the observed events as being generated by a
nonhomogeneous Poisson process.  Previous research has typically
modeled the intensity function associated with the process as being
stepwise constant, either with a single jump \citep{rafteryakman,
  carlin, westogden} or multiple jumps \citep{youngkuo, fearnhead,
  adamscp}.  These methods have not assumed correlation between the
constant intensity function values between changepoints.

To address correlation in the intensity function, log-Gaussian Cox
processes have been employed \citep{moller}, which place a
log-Gaussian process prior distribution on the intensity function and
allow for flexible modeling of the latent function's properties
\citep{gpml}.  Changepoints in the dynamics of Gaussian processes have
been studied \citep{cpcj, bocpd}, but these models have not been
applied to nonstationary point processes.

Here we will employ Gaussian-process changepoint models to perform
inference in log-Gaussian Cox processes with intensity functions
suspected of changepoints.  This method is very powerful, allowing for
the modeling of a wide variety of changepoints in single processes,
including changes in output and input scales, amplitude and period in
periodic signals, and continuous and drastic discontinuous changes in
covariance function. We will introduce a novel means of managing smooth changes. We will also demonstrate how one could model a
change in the correlation coefficient between two correlated
processes.  To the best of our knowledge, this is the first time such
a changepoint has been addressed, in any context.

The covariance functions that will be used to model suspected
changepoints will introduce more hyperparameters to the Gaussian
process prior.  Here we will use a fully Bayesian approach,
marginalizing the model hyperparameters when making predictions and
finding the posterior distributions over the hyperparameters when
desired.  The latter will be particularly important when analyzing the
locations of potential changepoints.

The remainder of this paper is arranged as follows.  In the next
section we will introduce Poisson processes, Cox processes, Gaussian
processes, and in particular the log-Gaussian Cox process.  In Section
\ref{changepoints}, we will briefly introduce a few examples of
possible changes in dynamics that can be modeled by appropriately
constructed covariance functions in a Gaussian-process framework.  In
Section \ref{results}, we will apply the methods discussed in this
paper to two real-world datasets displaying several types of
changepoints, including a change between two correlated processes.
Finally, we will conclude with a brief discussion.

\section{Nonhomogeneous Poisson Processes and Inference}
\label{poisson}

A \emph{Poisson process} is a stochastic process that models events as
being generated independently from each other with a given rate.  Let
$\cm{X} \subseteq \R^d$ be a Euclidean domain and let $\xi \deq
\sum_{i = 1}^n \delta(x_i)$ be a finite counting measure-valued random
variable on $\cm{X}$, where $\delta$ is the Dirac measure, $n \in \N$
is a natural number-valued random variable, and $\lbrace x_i
\rbrace_{i = 1}^n \subset \cm{X}$ is a random finite set of points.
Let $\lambda\colon \cm{X} \to \R^+$ be an arbitrary nonnegative
function on $\cm{X}$ and let $\lbrace X_i \rbrace_{i = 1}^m$ be a
finite set of disjoint, bounded Borel-measurable subsets of $\cm{X}$.
Define $X \deq \cup_i X_i$.  The Poisson process is characterized by
\begin{align}
  p\bigl(\xi(X) \given \lambda \bigr) 
  &= 
  \prod_{i=1}^m p\bigl(\xi(X_i)\given \lambda \bigr),
  \nonumber
  \\
  p\bigl(\xi(X_i) \given \lambda \bigr)
  &=
  \poisson\left( \textstyle \int_{X_i} \lambda \right),
  \label{realprobability}
\end{align}
where the integral of $\lambda$ is taken with respect to the Lebesgue
measure.

The function $\lambda$ is called the \emph{intensity} function, and
more events are expected to be observed in regions where it is higher.
If $\lambda$ is a constant function, the process is
\emph{homogeneous}; otherwise, it is \emph{nonhomogeneous}.  The input
domain $\cm{X}$ is often taken to be a single temporal dimension,
however spatial and spatiotemporal Poisson process models are also
common, for example, in geostatistics \citep{diggle}.

\subsection{Cox processes}

The \emph{Cox process} provides a common and effective mechanism for
performing inference about nonhomogeneous Poisson processes with an
unknown intensity function.  In a Cox process, the intensity function
is assumed to be generated by a stochastic process separate from the
Poisson process itself.  The generating process can be anything,
although a common approach is the \emph{log-Gaussian} Cox process,
where a Gaussian process prior is placed on the logarithm of the
latent intensity.  Such a model provides a powerful nonparametric
Bayesian approach to the problem, and the Gaussian process prior in
particular provides a flexible mechanism for modeling the latent
intensity.  We will briefly introduce Gaussian processes below and
then discuss inference in the context of Poisson processes.

\subsubsection{Gaussian processes}

Gaussian processes provide a simple, flexible framework for performing
Bayesian inference about functions \citep{gpml}.  A Gaussian process
is a distribution on the functions $f\colon \cm{X} \to \R$ with the
property that the distribution of the function values at a finite
subset of points $F \subseteq \cm{X}$ is multivariate Gaussian.  A
Gaussian process is completely defined by its first two moments: a
mean function $\mu\colon \cm{X} \to \R$ and a symmetric positive
semidefinite covariance function $K\colon \cm{X} \times \cm{X} \to
\R$.  The mean function describes the overall trend of the function
and is typically set to a constant for convenience.  The covariance
function describes how function values are correlated as a function of
their locations in the domain, thereby encapsulating information about
the overall shape and behavior of the signal.  Many covariance
functions are available to model a wide variety of anticipated
signals.

Suppose we have chosen a Gaussian process prior distribution on the
function $f\colon \cm{X} \to \R$ and a finite set of input points
$\bm{x}$.  The prior distribution on $\bm{f} \deq f(\bm{x})$ is
\begin{equation*}
 p(\bm{f} \given \bm{x}, \theta)
 =
 \cm{N}
 \bigl(
   \bm{f};
   \mu(\bm{x}; \theta),
   K(\bm{x}, \bm{x}; \theta)
 \bigr),
\end{equation*}
where $K(\bm{x}, \bm{x}; \theta)$ is the Gram matrix of the points
$\bm{x}$, and $\theta$ is a vector containing any parameters required
of $\mu$ and $K$, which comprise hyperparameters of the model.

\subsubsection{The log-Gaussian Cox process}

In the log-Gaussian Cox process \citep{moller, diggle}, we place a
Gaussian process prior on $\nu \deq \log \lambda$:
\begin{equation*}
  p(\nu \given \theta) 
  \deq 
  \cm{GP}\bigl(\nu; \mu(\cdot; \theta), K(\cdot, \cdot; \theta)\bigr).
\end{equation*}
Log-Gaussian Cox processes have several useful features, including
providing a natural nonparametric method for inferring the intensity
function from observations and the ability to work in continuous time.

Given some observed events $\data \deq \lbrace x_i \rbrace_{i=1}^n
\subset \cm{X}$, our goal is to find the posterior distribution of
$\nu$ conditioned on those events,
\begin{equation}
  \label{nuposterior}
  p(\nu \given \data)
  =
  \frac
  {
    \int p(\data \given \nu)
    p(\nu \given \theta)
    p(\theta) \intd{\theta}
  }
  {
    \iint p(\data \given \nu)
    p(\nu \given \theta) 
    p(\theta) 
    \intd{\theta} \intd{\nu}
  }.
\end{equation}
The likelihood is given by
\begin{equation}
  \label{hard}
  p(\data \given \nu) 
  =
  \exp\left( -\int_{\cm{V}} \exp \nu \right) \prod_{i = 1}^n \exp \nu(x_i),
\end{equation}
which is unfortunately not tractable due to the integral over $\nu$.
To make things worse, the additional integrals over $\theta$ and $\nu$
in \eqref{nuposterior} are also not tractable.\footnote{\citep{adams}
  notes that \eqref{nuposterior} is ``doubly intractable,'' due to
  having two intractable integrals.  With the addition of the integral
  over $\theta$ here, the expression becomes triply intractable!}  We
will therefore have to resort to various approximations.  In this
manuscript, we focus on modeling rather than inference and will use
fairly simple approximations for convenience.

First, we will approximate $p(\data \given \nu)$ via discretization,
suggested by \citet{moller}.  Let $\lbrace X_i \rbrace_{i=1}^m$, $\cup
X_i = \cm{X}$ be a finite partition of $\cm{X}$, where each $X_i$ is
bounded and Borel-measurable. We will assume a constant (log)
intensity on each region $X_i$, which we will notate with $\nu_i$, and
approximate the likelihood with
\begin{equation}
  \label{approximatelikelihood}
  p(\data \given \nu) 
  \approx
  \prod_{i=1}^m
  \poisson\bigl(\xi(X_i); m(X_i) \exp \nu_i\bigr),
\end{equation}
where $m$ is the Lebesgue measure.  This statement agrees with
\eqref{realprobability}, except that we have replaced $\int_{X_i}
\lambda$ with the rectangular approximation $m(X_i) \exp \nu_i$.
\citet{ghosh} showed that \eqref{approximatelikelihood} is consistent
as $\max_i m(X_i) \to 0$.

Many schemes are available for approximating or sampling from the
posterior $p(\nu \given \data, \theta)$.  In our experiments below, we
will use the simple \emph{Laplace approximation}
\citep{williamsbarber}, which can be efficiently calculated and which
performed well for our purposes.  Expectation propagation
\citep{minka} provides a popular alternative.  \citet{inla} has
proposed yet another approximation for models with a Gaussian-process
prior on latent functions called integrated nested Laplace
approximation (\textsc{inla}). \citet{adams} and \citet{ess} gave
algorithms for sampling exactly from the posterior using Markov chain
Monte Carlo techniques that do not require discretization or other
approximations.  The tradeoff for these more-complicated but
potentially more-accurate methods is a (sometimes substantial)
increase in computational cost.  In our experiments, the Laplace
approximation proved highly effective while being very inexpensive to
compute.

Let $\bm{\nu} \deq [\nu_1, \dotsc, \nu_m]^\top$ represent the latent
log intensities on the partition elements $\lbrace X_i \rbrace$.  In
the Laplace approximation, we approximate the posterior $p(\bm{\nu}
\given \data, \theta)$ with a Gaussian distribution obtained by taking
a second-order Taylor expansion of the likelihood around its mode.
Let
\begin{align*}
  \hat{\bm{\nu}}
  &\deq 
  \argmax_{\bm{\nu}} p(\bm{\nu} \given \data, \theta), \\
  \Sigma 
  &\deq 
  -\nabla\nabla \log p(\bm{\nu} \given \data, \theta)
  \,
  \Bigr\rvert_{\bm{\nu} = \hat{\bm{\nu}}},
\end{align*}
be the posterior mode of the likelihood and the Hessian of the
negative log likelihood around the mode, respectively.  The Laplace
approximation is 
\begin{equation*}
  p(\nu \given \data, \theta) 
  \approx
  \cm{N}(\hat{\bm{\nu}}, \Sigma^{-1}).
\end{equation*}
The Laplace approximation has the convenient property that the
posterior is approximated with a Gaussian process.

\section{Changepoints in Intensity Functions}
\label{changepoints}

In many applications of log-Gaussian Cox processes (e.g.,
\citep{moller, diggle, adams}), the distribution of $\lambda$ is
assumed to be translation-invariant (stationary).  Here we are
interested in processes where the intensity function is explicitly not
assumed to be stationary---changepoints in the dynamics of the process
are localized and break translation invariance.  Stationary models
have been applied \citep{adams} even when modeling a process that is
suspected of having undergone a change; here we propose a method for
making this simplification unnecessary.

The core of our approach will be the application of Gaussian process
covariance functions designed to model sudden changes in their
associated hyperparameters \citep{cpcj}.  These models are based on
mechanisms that can be conveniently ``plugged-into'' standard
off-the-shelf stationary covariances to handle changes in their
most-commonly shared features (such as input and output scales).

For ease of exposition, let us consider the family of one-dimensional
stationary covariance functions, which we may write as
\begin{equation*}
  K(t, t'; \omega, \sigma)
  \deq
  \omega^2
  \kappa\left( \frac{\lvert t - t' \rvert}{\sigma} \right),
\end{equation*}
where $\kappa\colon \R \to \R$ is a kernel function, and $\omega$ and
$\sigma$ serve as output and input scales, respectively.  The output
scale defines the overall process variance, and the input scale
defines a characteristic length in the input domain of the correlation
across function values---larger input scales typically indicate more
long-scale dependence and therefore smoother functions.

Extending the constructions below to the multidimensional case is not
difficult.  In many applications, the input space will be a single
temporal dimension, in which case the covariances below can be used
without modification.

A variety of different types of changepoints can be effectively
modeled via appropriately constructed covariance functions
\citep{cpcj, thesis}.  Below we will cover only those used in the
experiments below; however, the previous references may be consulted
for more examples and information.

\subsection{A sudden change in output scale}

Suppose an intensity function is suspected of having undergone a
sudden change in output scale at time $t_c$.  Let $\omega_1$ and
$\omega_2$ represent the output scale before and after the changepoint,
respectively.  We may model such a function with the covariance
\begin{equation*}
  K_{\text{CO}}(t, t'; \omega_1, \omega_2, \sigma, t_c)
  \deq 
  a(t; \omega_1, \omega_2, t_c)
  K(t, t'; \omega = 1, \sigma)
  a(t'; \omega_1, \omega_2, t_c),
\end{equation*}
where the $a$ function serves to provide the desired nonstationarity:
\begin{equation}
  \label{outchange}
  a(t; \omega_1, \omega_2, t_c) 
  \deq
  \begin{cases}
    \omega_1 & t < t_c; \\
    \omega_2 & t \geq t_c.
  \end{cases}
\end{equation}

Notice that function values are correlated across the changepoint,
despite the change in dynamics.  This is an advantage of the approach
we use here---other suggested methods have assumed independence
across the changepoint boundary \citep{adamscp, bocpd}, which is not
necessary and is often not true.

The covariance $K_{\text{CO}}$ is demonstrated in Figure
\ref{fig:changeoutput}.

\begin{figure}
  \centering
  \psff{changepointcovsamples3} \\
  \bigskip
  \psff{changepointcov3}\medskip
  \caption{Illustration of the covariance function $K_{\text{CO}}$
    modeling a change in output scale at time $t = \text{5}$.  The
    component covariance $K$ was taken to be the squared exponential
    covariance function $K_{\text{SE}}$ \eqref{sqdexp} with input
    scale $\sigma = {1}$.  The output scale changes from $\omega
    = {1}$ before the changepoint to $\omega = \nicefrac{3}{2}$
    after. Top: Prior distribution showing mean and pointwise $\pm
    {2}$ standard-deviation bounds, along with five samples from
    the prior.  The jump discontinuity at the changepoint has been
    removed for clarity.  Bottom: Prior covariance function
    $K_{\text{CO}}(4, t)$, along with $K_{\text{SE}}(4, t)$ for
    reference.  }
  \label{fig:changeoutput}
\end{figure}

\subsection{A sudden change in input scale}

Suppose an intensity function is suspected of having undergone a
sudden change in input scale at time $t_c$.  Let $\sigma_1$ and
$\sigma_2$ represent the output scale before and after the changepoint,
respectively.  We may model such a function with the covariance
\begin{equation*}
  K_{\text{CI}}(t, t'; \omega, \sigma_1, \sigma_2, t_c)
  \deq 
  K
  \bigl(
    u(t; \sigma_1, \sigma_2, t_c), 
    u(t; \sigma_1, \sigma_2, t_c); 
    \omega, \sigma = 1 
  \bigr),
\end{equation*}
where the $u$ function serves to provide the desired nonstationarity:
\begin{equation}
  \label{inchange}
  u(t; \sigma_1, \sigma_2, t_c) 
  \deq
  \begin{cases}
    \frac{t}{\sigma_1} 
    & t < t_c; \\
    \frac{t_c}{\sigma_1} + \frac{t - t_c}{\sigma_2} 
    & t \geq t_c.
  \end{cases}
\end{equation}

The covariance $K_{\text{CI}}$ is demonstrated in Figure
\ref{fig:changeinput}.

\begin{figure}
  \centering
  \psff{changepointcovsamples2} \\
  \bigskip
  \psff{changepointcov2}\medskip
  \caption{Illustration of the covariance function $K_{\text{CI}}$
    modeling a change in input scale at time $t = \text{5}$.  The
    component covariance $K$ was taken to be the squared exponential
    covariance function $K_{\text{SE}}$ \eqref{sqdexp} with output
    scale $\omega = 1$.  The input scale changes from $\sigma = 1$
    before the changepoint to $\sigma = \nicefrac{5}{2}$ after. Top:
    Prior distribution showing mean and pointwise $\pm 2$
    standard-deviation bounds, along with five samples from the prior.
    Bottom: Prior covariance function $K_{\text{CI}}(4, t)$, along
    with $K_{\text{SE}}(4, t)$ for reference.  }
  \label{fig:changeinput}
\end{figure}

\subsection{A sudden change in correlation}

Let us suppose now that we have two functions of interest,
$\lambda_1(t)$ and $\lambda_2(t)$, whose function values are
correlated with correlation $\rho$ until a point $t_c$, after which
time the function values become uncorrelated.  We will write the input
domain here as $\cm{X} \deq \lbrace 1, 2 \rbrace \times \R$, where the
first discrete coordinate identifies which of the two functions an
observation is associated with.  We can then model this situation with
the covariance
\begin{equation}
  \label{correlation}
  K_{\textsc{CC}}\bigl( [t, \ell], [t', \ell'] ; \omega, \sigma, \rho, t_c \bigr)
  \deq
  \begin{cases}
    \phantom{\rho}K(t, t'; \omega, \sigma) 
    & \ell = 1, \ell' = 1; \\
    \rho K(t, t'; \omega, \sigma) 
    & \ell = 1, \ell' = 2, t' < t_c; \\
    \rho K(t, t'; \omega, \sigma) 
    & \ell = 2, \ell' = 1, t < t_c; \\
    \phantom{\rho}K(t, t'; \omega, \sigma) 
    & \ell = 2, \ell' = 2, t < t_c, t' < t_c; \\
    \phantom{\rho}K(t, t'; \omega, \sigma) 
    & \ell = 2, \ell' = 2, t \geq t_c, t' \geq t_c; \\
    \phantom{\rho}0 & \text{otherwise.}
  \end{cases}
\end{equation}
Notice that $\lambda_2$ is forced to have a break in correlation with
itself across the changepoint boundary.  This is required by the
assumptions of the model---if $\lambda_1$ and $\lambda_2$ are to be
uncorrelated with each other after $t_c$, they cannot both remain
correlated with themselves (and therefore each other) across the
boundary.  This asymmetry in the model should be addressed when
modeling such behavior.

The covariance $K_{\text{CC}}$ is demonstrated in Figure
\ref{fig:changecorrelation}.

\begin{figure}
  \centering
  \psff{correlationcp_samples_1} \\
  \bigskip
  \psff{correlationcp_samples_2}\medskip
  \caption{Illustration of the covariance function $K_{\text{CC}}$
    modeling a change in correlation between two signals at time $t =
    \text{5}$.  The component covariance $K$ was taken to be the
    squared exponential covariance function $K_{\text{SE}}$
    \eqref{sqdexp} with output scale $\omega = 1$ and input scale
    $\sigma = 1$.  The correlation changes from $\rho = 0.9$ before
    the changepoint to $\rho = 0$ after. Top: Prior distribution
    showing mean and pointwise $\pm {2}$ standard-deviation
    bounds for signal 1, along with five samples from the prior.
    Bottom: Prior distribution showing mean and pointwise $\pm
    {2}$ standard-deviation bounds for signal 1, along with five
    samples from the prior.  }
  \label{fig:changecorrelation}
\end{figure}

\subsection{Remarks}

We may make a few notes on the covariance functions defined above.
First, note that these covariance functions (as well as the further
examples shown in \citep{cpcj, thesis}) may be be combined to model
changes in multiple hyperparameters.  For example, suppose that
a function were suspected of having undergone a change in both 
output and input scale at time $t_c$.  We may model this by combining
\eqref{outchange} and \eqref{inchange} to form
\begin{multline}
  \label{bothchange}
  K_{\text{CB}}
  (t, t'; \lambda_1, \lambda_2, \sigma_1, \sigma_2, t_c)
  \\
  \deq
  a(t; \lambda_1, \lambda_2, t_c)
  K
  \bigl(
    u(t; \sigma_1, \sigma_2, t_c),
    u(t'; \sigma_1, \sigma_2, t_c)
    ;
    \lambda = 1, \sigma = 1
  \bigr)
  a(t'; \lambda_1, \lambda_2, t_c).
\end{multline}
Multiple changepoints can also be trivially incorporated into any of
the covariances after introducing additional hyperparameters.

In certain situations, it might also be useful to use the covariances
defined above after transforming the inputs in some way.  For example,
if we were to apply the transformation
\begin{equation*}
  t(x; \gamma) 
  \deq 
  x 
  \mapsto 
  \Biggl[ 
    \cos \biggl( \frac{2 \pi x}{\gamma} \biggr),
    \sin \biggl( \frac{2 \pi x}{\gamma} \biggr)
  \Biggr]^\top,
\end{equation*}
we could model periodic functions with period $\gamma$.  If we were to
then employ \eqref{outchange} or \eqref{inchange}, we could then
handle a change in this signal's period or amplitude.

Finally, additional changepoint covariances have been defined if
desired.  \citet{cpcj} describe a method for modeling both continuous
and drastic changepoints in the covariance function itself, as well as
various mechanism for handling faulty observations in the regression
case with Gaussian noise that might continue to be useful with a
Poisson likelihood in certain circumstances.

\section{Results}
\label{results}

We have tested the framework described above on two real-world
datasets.  The first is a well-known dataset exhibiting a changepoint
in intensity, and the second is a pair of point processes that undergo
a change in correlation.

\subsection{Coal-mining data}

\begin{figure}
  \centering
  \subfloat[][]{
    \psff{posterior}
    \label{posterior}
  }
  \\
  \subfloat[][]{
    \psff{changepoint}
    \label{changepoint}
  }
  \caption{Analysis of the coal-mining data.  \subref{posterior}:
    Inferred intensity function for the process generating the
    disasters.  The rug plot shows the times of the disasters, and the
    posterior distribution over the intensity function is indicated by
    its mean function, shown in dark blue, as well as a pointwise
    $95$\% credible interval, shown in light blue.
    \subref{changepoint}: Posterior distribution over changepoint
    location.  }
  \label{coal}
\end{figure}

The first dataset comprises 190 events over the time period from March
15, 1851 to March 22, 1962; each represents a coal-mining disaster
that killed at least ten people in the United Kingdom.  These data,
first presented in their most commonly encountered form by
\citet{jarrett}, have been analyzed several times in the context of
nonhomogeneous Poisson processes, because external factors are
suspected of having affected the rate of disasters throughout this
period.  The events are indicated by the rug plot along the axis of
Figure \ref{coal}\subref{posterior}.

We modeled these data with a nonhomogeneous Poisson process with a
Gaussian process prior on $\nu$ as described in Section \ref{poisson}.
The prior mean of the Gaussian process on $\nu$ was taken to be the
constant zero function, and the covariance function was chosen to be
the squared exponential covariance function
\begin{equation}
  \label{sqdexp}
  K_{\text{SE}}(t, t'; \omega, \sigma)
  \deq
  \omega^2
  \exp
  \left(
  \frac{ \lvert t - t' \rvert^2}{2\sigma^2}
  \right),
\end{equation}
modified to handle a simultaneous change in both output and input
scale as in \eqref{bothchange}.

Our model therefore had five associated hyperparameters: the output
scales $\omega_1$ and $\omega_2$, the input scales $\sigma_1$ and
$\sigma_2$, and the time of changepoint $t_c$.  Calculating the
posterior distribution $p(\nu \given \data)$ requires marginalizing
these unknown parameters \eqref{nuposterior}.  To accomplish this, we
placed noninformative priors on all five hyperparameters and sampled
from their joint posterior distribution via slice sampling.  For
convenience, the allowed changepoint locations were discretized to a
grid with six months separating each point.  Approximately $4.5 \times
10^6$ samples were taken in total, as well as additional, discarded
samples that served as ``burn-in'' for the Markov chain to mix.  The
posterior $p(\nu \given \data)$ was then approximated as a Gaussian
process mixture of the approximate posterior distributions conditioned
on the sampled hyperparameter values.

Figure \ref{coal}\subref{posterior} shows the approximate posterior
distribution $p(\lambda \given \data)$; the distribution was
appropriately transformed from the log domain.  We also approximated
the posterior distribution over changepoint location alone, $p(t_c
\given \data)$, using Bayesian Monte Carlo to approximate the required
integral \citep{bmc} and estimates of the marginal probabilities from
the samples described above.  The result is shown in Figure
\ref{coal}\subref{changepoint}.

There seems to be evidence of a changepoint sometime during the period
from 1870--1890, agreeing with the findings of other research
\citep{jarrett, youngkuo, rafteryakman, westogden, fearnhead,
  adamscp}.  Previous authors have noted that the parliament of the
United Kingdom passed several acts regulating coal mines during this
period, with the intent of increasing safety for workers.  In
particular, the Coal Mines Regulation Acts of 1872 and 1887
constituted massive reform.  Our analysis here agrees with previous
work that these acts were probably beneficial.  The addition hump near
1950 might be related to the Mines and Quarries Act, 1954, which
imposed further safety regulation.  \citet{fearnhead} located the same
hump but with more posterior probability mass around 1950, and
\citet{carlin} and \citet{adamscp} with a posterior more similar to
ours.

We make a few notes on this analysis.  Notice that the posterior
variance of $p(\lambda \given \data)$ is higher in the region from
1870--1890; this corresponds with our uncertainty in $t_c$ and
reflects principled Bayesian analysis.  Furthermore, notice that the
posterior mean prior to the putative changepoint is quite flat.  The
hyperparameter samples with the highest posterior probability in fact
tend to fit the data with an essentially homogeneous Poisson process
during the region from 1851--1870 (by having a very large $\sigma_1$),
followed by a nonhomogeneous process with a much smaller $\sigma_2$.
Indeed, \citet{jarrett} noted that the events in the first $\approx
\!30\,\text{years}$ appeared to be generated by a homogeneous process.

An additional advantage of our approach is the flexibility afforded by
allowing such nonstationarity in a consistent framework.

\begin{figure}
  \centering
  \subfloat[direct fire][]{
    \psff{direct}
    \label{direct}
  }
  \\
  \subfloat[indirect fire][]{
    \psff{indirect}
    \label{indirect}
  }
  \caption{Analysis of the \textsc{sigacts} data.  \subref{direct}:
    Inferred intensity function for the process generating direct-fire
    incidents.  The rug plot shows the times of the events, and the
    posterior distribution over the intensity function is indicated by
    its mean function, shown in dark blue, as well as a pointwise
    $95$\% credible interval, shown in light blue.  The rug plot has
    been thinned by a factor of 30 for ease of
    visualization. \subref{indirect}: Inferred intensity function for
    the process generating indirect-fire incidents; the properties of
    the plot are equivalent to those in \subref{direct}.}
  \label{sigacts}
\end{figure}

\subsection{\textsc{sigacts} data}

Our next dataset comprises two event streams corresponding to reports
written by \textsc{us} soldiers on significant activities
(\textsc{sigacts}) during the War in Afghanistan from January 1, 2007
to December 31, 2008, binned by day.  The data are shown in the rug
plots of Figure \ref{sigacts}.  Figure \ref{sigacts}\subref{direct}
shows incidents involving direct fire on \textsc{nato}--\textsc{isaf}
coalition forces (9\,133 in total), and \ref{sigacts}\subref{indirect}
shows incidents involving indirect fire (5\,760 in total).  There
appears to be correlation between the rate of these two types of
incidents until sometime in the second half of 2008, at which point
there is a sharp increase in the number of direct-fire incidents.
Indirect-fire incidents, however, show a decrease throughout this
period.

We modeled these data using the change-in-correlation covariance
\eqref{correlation}, where the generic covariance $K$ was again chosen
to be the squared exponential \eqref{sqdexp}. We assumed a nonnegative
correlation $\rho$ between the two intensity functions before an
unknown changepoint $t_c$ and no correlation thereafter.  The
indirect-fire incidents were chosen to have the necessary break in
correlation across the changepoint due to having fewer events overall.
Finally, the intensity functions were assumed to have a shared output
and input scale across the entire domain.

Noninformative priors were again placed on the hyperparameters $\theta
\deq (\omega, \sigma, \rho, t_c)$, placing the priors on the
logarithms of $\omega$ and $\sigma$ due to their being
nonnegative. Initial analysis of the data via slice sampling from the
posterior distribution of $\theta$ indicated that the marginal
distribution of $\omega$ was tightly centered on $\log
\nicefrac{1}{2}$ and that the marginal distribution of $\sigma$ was
tightly centered on $\log(45\, \text{days})$.  We fixed these parameters
at these values and concentrated further analysis on $t_c$ and $\rho$.

We again used slice sampling to sample from the posterior
distributions of $\rho$ and $t_c$; the joint posterior is shown in
Figure \ref{sigactsjoint}.  There is strong evidence of a changepoint
in either March or (much more localized) in mid-August, 2008.  There
is also strong evidence of a significant correlation prior to the
putative changepoint.  The posterior mean $\mathbb{E}[\rho \given
  \data]$ is approximately $0.71$.  The observed changepoint in
mid-August, 2008 might be related to a significant increase in
activity along the Pakistan border than began on 3 September 2008.

We note that modeling these intensity functions using the flexible
change-in-correlation model resulted in a significant benefit: the
data are $3.15 \times 10^{135}$ (Bayes factor $317$\,nats) times more
likely to have been generated under this model than under a model that
assumes no correlation between intensity functions and no changepoint,
$4.80 \times 10^{4}$ (Bayes factor $10.8$\,nats) times more likely
than under a model that assumes correlation but no changepoint, and
$9.04 \times 10^{68}$ (Bayes factor $159$\,nats) times more likely
than under a model that assumes a changepoint but no correlation.

\begin{figure}
  \centering \psff{sigacts_posterior}
  \caption{The posterior distribution over the correlation $\rho$ and
    the date of $t_c$ over the year 2008 for the \textsc{sigacts}
    data.  The contour plot shows the log probability $p(\rho, t_c
    \given \data)$ and spans 20 orders of magnitude (base $e$).  
    Darker colors indicate less probability mass.
  }
  \label{sigactsjoint}
\end{figure}

Figures \ref{sigacts}\subref{direct} and \subref{indirect} show the
full posterior $p(\nu \given \data)$ after marginalizing out all
hyperparameters $\theta$, including $t_c$.  Again we can see an
increase in posterior variance near where putative changepoint
locations are likely.  Additionally, the model compels the posterior
over intensity functions to be strongly correlated in the earlier
dates.

\section{Conclusion}

We have suggested a fully Bayesian framework for modeling
nonstationary point processes that undergo sudden changepoints in
their behavior.  Our proposal involves placing appropriately
constructed Gaussian-process prior distributions on the log intensity
function for use in a log-Gaussian Cox process.  We may then make
predictions about the intensity by approximating the full posterior
$p(\nu \given \data)$ via the marginalization of the model
hyperparameters and approximate the posterior over model
hyperparameters as well, in particular changepoint location $p(t_c
\given \data)$, if desired.

This work can be extended in several ways.  First, the method
presented here models changepoints that occur suddenly at a single
localized point in time, which causes the process to become
mean-square nondifferentiable (or even mean-square discontinuous) at
the changepoint boundary.  We would like to investigate models that
incorporate smooth changes in dynamics over periods of time with
adaptive length.  This would allow the discovery of a change occurring
over an unknown interval of time.  Additionally, we believe that a
more flexible model for changes in correlation between processes can
be constructed by treating the data as being generated by three
(rather than two) mutually correlated intensity functions with
appropriately selected parameters, with the tradeoff of having
additional hyperparameters to handle.  We note, however, that the
simple model presented here worked very well for our application.

\bibliography{nhpp}

\end{document}
